Equilibrated adaptive learning rates for non-convex optimization
نویسندگان
چکیده
Parameter-specific adaptive learning rate methods are computationally efficient ways to reduce the ill-conditioning problems encountered when training large deep networks. Following recent work that strongly suggests that most of the critical points encountered when training such networks are saddle points, we find how considering the presence of negative eigenvalues of the Hessian could help us design better suited adaptive learning rate schemes. We show that the popular Jacobi preconditioner has undesirable behavior in the presence of both positive and negative curvature, and present theoretical and empirical evidence that the socalled equilibration preconditioner is comparatively better suited to non-convex problems. We introduce a novel adaptive learning rate scheme, called ESGD, based on the equilibration preconditioner. Our experiments show that ESGD performs as well or better than RMSProp in terms of convergence speed, always clearly improving over plain stochastic gradient descent.
منابع مشابه
RMSProp and equilibrated adaptive learning rates for non-convex optimization
Parameter-specific adaptive learning rate methods are computationally efficient ways to reduce the ill-conditioning problems encountered when training large deep networks. Following recent work that strongly suggests that most of the critical points encountered when training such networks are saddle points, we find how considering the presence of negative eigenvalues of the Hessian could help u...
متن کاملAn Intelligent Approach Based on Meta-Heuristic Algorithm for Non-Convex Economic Dispatch
One of the significant strategies of the power systems is Economic Dispatch (ED) problem, which is defined as the optimal generation of power units to produce energy at the lowest cost by fulfilling the demand within several limits. The undeniable impacts of ramp rate limits, valve loading, prohibited operating zone, spinning reserve and multi-fuel option on the economic dispatch of practical p...
متن کاملSparse Regularized Deep Neural Networks For Efficient Embedded Learning
Deep learning is becoming more widespread in its application due to its power in solving complex classification problems. However, deep learning models often require large memory and energy consumption, which may prevent them from being deployed effectively on embedded platforms, limiting their applications. This work addresses the problem by proposing methods Weight Reduction Quantisation for ...
متن کاملAlgorithmic Connections between Active Learning and Stochastic Convex Optimization
Interesting theoretical associations have been established by recent papers between the fields of active learning and stochastic convex optimization due to the common role of feedback in sequential querying mechanisms. In this paper, we continue this thread in two parts by exploiting these relations for the first time to yield novel algorithms in both fields, further motivating the study of the...
متن کاملUnifying Stochastic Convex Optimization and Active Learning
First order stochastic convex optimization is an extremely well-studied area with a rich history of over a century of optimization research. Active learning is a relatively newer discipline that grew independently of the former, gaining popularity in the learning community over the last few decades due to its promising improvements over passive learning. Over the last year, we have uncovered co...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015